AITopics | regularized m-estimator

Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Neural Information Processing SystemsSep-30-2025, 13:07:05 GMT

We establish theoretical results concerning all local optima of various regularized M-estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss function satisfies restricted strong convexity and the penalty function satisfies suitable regularity conditions, any local optimum of the composite objective function lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models; regression in generalized linear models using nonconvex regularizers such as SCAD and MCP; and graph and inverse covariance matrix estimation. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision epsilon in log(1/epsilon) iterations, which is the fastest possible rate of any first-order method. We provide a variety of simulations to illustrate the sharpness of our theoretical predictions.

nonconvexity, regularized m-estimator, statistical and algorithmic theory, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.61)

Add feedback

Precise Asymptotics of Bagging Regularized M-estimators

Koriyama, Takuya, Patil, Pratik, Du, Jin-Hong, Tan, Kai, Bellec, Pierre C.

arXiv.org Machine LearningOct-11-2024

We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators and construct a consistent estimator for the risk. Specifically, we consider a heterogeneous collection of $M \ge 1$ regularized M-estimators, each trained with (possibly different) subsample sizes, convex differentiable losses, and convex regularizers. We operate under the proportional asymptotics regime, where the sample size $n$, feature size $p$, and subsample sizes $k_m$ for $m \in [M]$ all diverge with fixed limiting ratios $n/p$ and $k_m/n$. Key to our analysis is a new result on the joint asymptotic behavior of correlations between the estimator and residual errors on overlapping subsamples, governed through a (provably) contractible nonlinear system of equations. Of independent interest, we also establish convergence of trace functionals related to degrees of freedom in the non-ensemble setting (with $M = 1$) along the way, extending previously known cases for square loss and ridge, lasso regularizers. When specialized to homogeneous ensembles trained with a common loss, regularizer, and subsample size, the risk characterization sheds some light on the implicit regularization effect due to the ensemble and subsample sizes $(M,k)$. For any ensemble size $M$, optimally tuning subsample size yields sample-wise monotonic risk. For the full-ensemble estimator (when $M \to \infty$), the optimal subsample size $k^\star$ tends to be in the overparameterized regime $(k^\star \le \min\{n,p\})$, when explicit regularization is vanishing. Finally, joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data (without any subagging).

env loss, estimator, reg, (17 more...)

arXiv.org Machine Learning

2409.15252

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

The Adaptive $\tau$-Lasso: Robustness and Oracle Properties

Mozafari-Majd, Emadaldin, Koivunen, Visa

arXiv.org Machine LearningOct-19-2023

This paper introduces a new regularized version of the robust $\tau$-regression estimator for analyzing high-dimensional datasets subject to gross contamination in the response variables and covariates (explanatory variables). The resulting estimator, termed adaptive $\tau$-Lasso, is robust to outliers and high-leverage points. It also incorporates an adaptive $\ell_1$-norm penalty term, which enables the selection of relevant variables and reduces the bias associated with large true regression coefficients. More specifically, this adaptive $\ell_1$-norm penalty term assigns a weight to each regression coefficient. For a fixed number of predictors $p$, we show that the adaptive $\tau$-Lasso has the oracle property, ensuring both variable-selection consistency and asymptotic normality. Asymptotic normality applies only to the entries of the regression vector corresponding to the true support, assuming knowledge of the true regression vector support. We characterize its robustness via the finite-sample breakdown point and the influence function. We carry out extensive simulations and observe that the class of $\tau$-Lasso estimators exhibits robustness and reliable performance in both contaminated and uncontaminated data settings. We also validate our theoretical findings on robustness properties through simulation experiments. In the face of outliers and high-leverage points, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators achieve the best performance or close-to-best performance in terms of prediction and variable selection accuracy compared to other competing regularized estimators for all scenarios considered in this study. Therefore, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators can be effectively employed for a variety of sparse linear regression problems, particularly in high-dimensional settings and when the data is contaminated by outliers and high-leverage points.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

arXiv.org Machine Learning

2304.0931

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Loh, Po-Ling, Wainwright, Martin J.

Neural Information Processing SystemsFeb-14-2020, 14:43:57 GMT

We establish theoretical results concerning all local optima of various regularized M-estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss function satisfies restricted strong convexity and the penalty function satisfies suitable regularity conditions, any local optimum of the composite objective function lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models; regression in generalized linear models using nonconvex regularizers such as SCAD and MCP; and graph and inverse covariance matrix estimation. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision epsilon in log(1/epsilon) iterations, which is the fastest possible rate of any first-order method. We provide a variety of simulations to illustrate the sharpness of our theoretical predictions.

local optima, regularized m-estimator, statistical and algorithmic theory, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.64)

Add feedback

On model selection consistency of regularized M-estimators

Lee, Jason D., Sun, Yuekai, Taylor, Jonathan E.

arXiv.org Machine LearningOct-11-2014

Regularized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Usually the low-dimensional structure is encoded by the presence of the (unknown) parameters in some low-dimensional model subspace. In such settings, it is desirable for estimates of the model parameters to be \emph{model selection consistent}: the estimates also fall in the model subspace. We develop a general framework for establishing consistency and model selection consistency of regularized M-estimators and show how it applies to some special cases of interest in statistical learning. Our analysis identifies two key properties of regularized M-estimators, referred to as geometric decomposability and irrepresentability, that ensure the estimators are consistent and model selection consistent.

artificial intelligence, machine learning, selection consistency, (14 more...)

arXiv.org Machine Learning

1305.7477

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Loh, Po-Ling, Wainwright, Martin J.

Neural Information Processing SystemsDec-31-2013

We establish theoretical results concerning all local optima of various regularized M-estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss function satisfies restricted strong convexity and the penalty function satisfies suitable regularity conditions, any local optimum of the composite objective function lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models; regression in generalized linear models using nonconvex regularizers such as SCAD and MCP; and graph and inverse covariance matrix estimation. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision epsilon in log(1/epsilon) iterations, which is the fastest possible rate of any first-order method. We provide a variety of simulations to illustrate the sharpness of our theoretical predictions.

artificial intelligence, machine learning, regression, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.32)

Add feedback

A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

Negahban, Sahand, Yu, Bin, Wainwright, Martin J., Ravikumar, Pradeep K.

Neural Information Processing SystemsDec-31-2009

High-dimensional statistical inference deals with models in which the the number ofparameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n 0, a line of recent work has studied models with various types of structure (e.g., sparse vectors; block-structuredmatrices; low-rank matrices; Markov assumptions). In such settings, a general approach to estimation is to solve a regularized convex program (known as a regularized M-estimator) which combines a loss function (measuring how well the model fits the data) with some regularization function that encourages theassumed structure. The goal of this paper is to provide a unified framework forestablishing consistency and convergence rates for such regularized M-estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive several existing results, and also to obtain several new results on consistency and convergence rates. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure the corresponding regularized M-estimators have fast convergence rates.

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Filters

Collaborating Authors

regularized m-estimator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Precise Asymptotics of Bagging Regularized M-estimators

The Adaptive $\tau$-Lasso: Robustness and Oracle Properties

Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

On model selection consistency of regularized M-estimators

Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers